Skip to content

Implement a new --failing-and-slow-first command line argument to test runner. #24624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

juj
Copy link
Collaborator

@juj juj commented Jun 26, 2025

This keeps track of results of previous test run, and on subsequent runs, failing tests are run first, then skipped tests, and last, successful tests in slowest-first order. This improves parallelism throughput of the suite.

Add support for --failfast in the multithreaded test suite to help stop suite runs at first test failures quickly.

These two flags --failfast and --failing-and-slow-first together can help achieve < 10 second test suite runs on a CI when the suite is failing.

Example core0 runtime with test/runner core0 on a 16-core/32-thread system:

Total core time: 2818.016s. Wallclock time: 118.083s. Parallelization: 23.86x.

Same suite runtime with test/runner --failing-and-slow-first core0:

Total core time: 2940.180s. Wallclock time: 94.027s. Parallelization: 31.27x.

Gaining a better throughput and a -20.37% test suite wall time.

juj added 6 commits June 26, 2025 19:33
…t runner. This keeps track of results of previous test run, and on subsequent runs, failing tests are run first, then skipped tests, and last, successful tests in slowest-first order. Add support for --failfast in the multithreaded test suite. This improves parallelism throughput of the suite, and helps stop at test failures quickly.
Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is what I currently use --failfast --continue for. The downside of --failfast --continue of course is that it doesn't work for parallel testing (so I also add -j1).

@@ -35,3 +35,6 @@ coverage.xml
# ...except the templates.
!/tools/run_python.ps1
!/tools/run_python_compiler.ps1

# Test runner previous run results for sorting the next run
__previous_test_run_results.json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably go in out/ along with existing last_test.txt file (used to implement --continue)

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually maybe I misunderstood. I use --failfast --continue when implementing new features and wanting to fix each test failure as I run into it.

How does this improve CI times on the bots? It seems like it would not effect the first run, but only subsequent runs, which the bots don't do, do they?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants